On the Dangers of Cross-Validation. An Experimental Evaluation
نویسندگان
چکیده
Cross validation allows models to be tested using the full training set by means of repeated resampling; thus, maximizing the total number of points used for testing and potentially, helping to protect against overfitting. Improvements in computational power, recent reductions in the (computational) cost of classification algorithms, and the development of closed-form solutions (for performing cross validation in certain classes of learning algorithms) makes it possible to test thousand or millions of variants of learning models on the data. Thus, it is now possible to calculate cross validation performance on a much larger number of tuned models than would have been possible otherwise. However, we empirically show how under such large number of models the risk for overfitting increases and the performance estimated by cross validation is no longer an effective estimate of generalization; hence, this paper provides an empirical reminder of the dangers of cross validation. We use a closed-form solution that makes this evaluation possible for the cross validation problem of interest. In addition, through extensive experiments we expose and discuss the effects of the overuse/misuse of cross validation in various aspects, including model selection, feature selection, and data dimensionality. This is illustrated on synthetic, benchmark, and real-world data sets.
منابع مشابه
Determining optimal value of the shape parameter $c$ in RBF for unequal distances topographical points by Cross-Validation algorithm
Several radial basis function based methods contain a free shape parameter which has a crucial role in the accuracy of the methods. Performance evaluation of this parameter in different functions with various data has always been a topic of study. In the present paper, we consider studying the methods which determine an optimal value for the shape parameter in interpolations of radial basis ...
متن کاملValidation of the Early Feeding Skills Assessment Scale for the Evaluation of Oral Feeding in Premature Infants
Background: Feeding difficulties are common and important in premature infants. In order to identify neonatal feeding difficulties, clinicians and nurses require assessment tools to conduct an objective evaluation of infant oral feeding (breast/bottle-feeding). Early identification of infants with feeding difficulty is critical to implement appropriate therapies and op...
متن کاملDevelopment and Validation of Attitude toward Gestational Surrogacy Scale in Iranian Infertile Couples
Objective Surrogacy is one of the most challenging infertility treatments engaging ethical, psychological and social issues. Attitudes survey plays an important role to disclosure variant aspects of surrogacy, to help meeting legislative gaps and ambiguities, and to convert controversial dimensions surrounding surrogacy to a normative concept that eliminates stigma. The aim of this study is to ...
متن کاملSynthesis and Experimental-Modelling Evaluation of Nanoparticles Movements by Novel Surfactant on Water Injection: An Approach on Mechanical Formation Damage Control and Pore Size Distribution
Water injection is used as a widespread IOR/EOR method and promising formation damages (especially mechanical ones) is a crucial challenge in the near-wellbore of injection wells. The magnesium oxide (MgO) NanoParticles (NPs) considered in the article underwater flooding experiment tests to monitor the promising mechanical formation damage (size exclusion) in lab mechanistic scale include m...
متن کاملThe Development and Validation of New Equations for Prediction of the Performance of Tangential Cyclones
New equations have been developed to predict the effect of geometrical dimensions of tangential cyclones on their operational performances. To check the validity of the derived equations, an experimental apparatus was set up and some experimental work was performed. It was observed that the experimental results confirm properly the theoretical predictions.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008